Declarative Cleaning, Analysis, and Querying of Graph-structured Data
نویسنده
چکیده
Title of dissertation: DECLARATIVE CLEANING, ANALYSIS, AND QUERYING OF GRAPH-STRUCTURED DATA Walaa Eldin Moustafa, Doctor of Philosophy, 2013 Dissertation directed by: Professor Amol Deshpande, Professor Lise Getoor, Department of Computer Science Much of today’s data including social, biological, sensor, computer, and transportation network data is naturally modeled and represented by graphs. Typically, data describing these networks is observational, and thus noisy and incomplete. Therefore, methods for efficiently managing graph-structured data of this nature are needed, especially with the abundance and increasing sizes of such data. In my dissertation, I develop declarative methods to perform cleaning, analysis and querying of graph-structured data efficiently. For declarative cleaning of graph-structured data, I identify a set of primitives to support the extraction and inference of the underlying true network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and cleaning techniques. The task specification language is based on Datalog with a set of extensions designed to enable different graph cleaning primitives. For declarative analysis, I introduce ‘ego-centric pattern census queries’, a new type of graph analysis query that supports searching for structural patterns in every node’s neighborhood and reporting their counts for further analysis. I define an SQL-based declarative language to support this class of queries, and develop a series of efficient query evaluation algorithms for it. Finally, I present an approach for querying large uncertain graphs that supports reasoning about uncertainty of node attributes, uncertainty of edge existence, and a new type of uncertainty, called identity linkage uncertainty, where a group of nodes can potentially refer to the same real-world entity. I define a probabilistic graph model to capture all these types of uncertainties, and to resolve identity linkage merges. I propose ‘contextaware path indexing’ and ‘join-candidate reduction’ methods to efficiently enable subgraph matching queries over large uncertain graphs of this type. DECLARATIVE CLEANING, ANALYSIS AND QUERYING OF GRAPH-STRUCTURED DATA
منابع مشابه
Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks
Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized ...
متن کاملArgQL: A Declarative Language for Querying Argumentative Dialogues
We introduce ArgQL, a declarative query language, which performs on a data model designed according to the principles of argumentation. Its syntax is based on Cypher (language for graph databases) and SPARQL 1.1 and is adjusted for querying dialogues, composed by sets of arguments and their interrelations. We use formal semantics to show how queries in ArgQL match against data in the argumentat...
متن کاملabstractions for managing sensor network data
| Sensor networking hardware, networking, and operating system software has matured to the point that the major challenges facing the field now have to do with storing, cleaning, and querying the data such networks produce. In this paper, we survey several research systems designed for managing sensor data using declarative database-like abstractions from the database community and specifically...
متن کاملLinear Logic Programming for Narrative Generation
In this paper, we explore the use of Linear Logic programming for story generation. We use the language Celf to represent narrative knowledge, and its own querying mechanism to generate story instances, through a number of proof terms. Each proof term obtained is used, through a resource-flow analysis, to build a directed graph where nodes are narrative actions and edges represent inferred caus...
متن کاملBenchmarking Declarative Approximate Selection
Benchmarking Declarative Approximate Selection Predicates Oktie Hassanzadeh Master of Science Graduate Department of Computer Science University of Toronto 2007 Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data so...
متن کامل